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Abstract 

■ Perfect Space-Time Block Codes (PSTBCs) achieve full diversity, full rate, nonvanishing constant minimum 

■ determinant, uniform average transmitted energy per antenna, and good shaping. However, the high decoding com- 

(n: 

^ . plexity is a critical issue for practice. When the Channel State Information (CSI) is available at both the transmitter 

1^ ■ and the receiver. Singular Value Decomposition (SVD) is commonly applied for a Multiple-Input Multiple-Output 

. (MIMO) system to enhance the throughput or the performance. In this paper, two novel techniques. Perfect Coded 

' Multiple Beamforming (PCMB) and Bit-Interleaved Coded Multiple Beamforming with Perfect Coding (BICMB- 

, PC), are proposed, employing both PSTBCs and SVD with and without channel coding, respectively. With CSI at 

^ ■ the transmitter (CSIT), the decoding complexity of PCMB is substantially reduced compared to a MIMO system 
employing PSTBC, providing a new prospect of CSIT. Especially, because of the special property of the generation 

^ ■ matrices, PCMB provides much lower decoding complexity than the state-of-the-art SVD-based uncoded technique 

, in dimensions 2 and 4. Similarly, the decoding complexity of BICMB-PC is much lower than the state-of-the- 

• art SVD-based coded technique in these two dimensions, and the complexity gain is greater than the uncoded 
case. Moreover, these aforementioned complexity reductions are achieved with only negligible or modest loss in 

• performance. 

•'^ ■ Index Terms 

X: 

H . 

C3 . MIMO, SVD, Perfect Space-Time Block Codes, Golden Code, BICMB, Constellation Precoding, Diversity, 

Decoding Complexity. 



I. Introduction 

In a Multiple-Input Multiple-Output (MIMO) system, when the Channel State Information (CSI) is 
available at the transmitter as well as the receiver, beamforming techniques, which exploit Singular Value 
Decomposition (SVD), are applied to achieve spatial multiplexing^ and thereby increase the data rate, 
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'in this paper, the term "spatial multiplexing" is used to describe the number of spatial subchannels, as in |1|. Note that the term is 
different from "spatial multiplexing gain" defined in 0. 
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or to enhance performance [|3]|. Nevertheless, spatial multiplexing without channel coding results in the 
loss of the full diversity order [4J. To overcome the diversity degradation, Bit-Interleaved Coded Multiple 
Beamforming (BICMB) interleaving the bit codeword through the multiple subchannels with different 
diversity orders was proposed Q, @. BICMB can achieve full diversity as long as the code rate and 
the number of employed subchannels S satisfy the condition RcS < 1 Moreover, by employing 

the constellation precoding technique, full diversity and full multiplexing can be achieved simultaneously 
for both uncoded and convolutional coded SVD systems with the trade-off of a higher decoding complexity 
||9l , [[Toll , ifm . [IT2I . Specifically, in the uncoded case, full diversity requires that all streams are precoded, 
i.e.. Fully Precoded Multiple Beamforming (FPMB). On the other hand, for the convolutional coded 
SVD systems without the condition R^S < 1, other than full precoding, i.e., Bit-Interleaved Coded 
Multiple Beamforming with Full Precoding (BICMB-FP), partial precoding, i.e., Bit-Interleaved Coded 
Multiple Beamforming with Partial Precoding (BICMB-PP), could also achieve both full diversity and 
full multiplexing with the properly designed combination of the convolutional code, the bit interleaver, 
and the constellation precoder. 

In MIMO systems, space-time coding can be employed to offer spatial diversity [|3]|. In |fT3l , Perfect 
Space-Time Block Codes (PSTBCs) were introduced for dimensions 2, 3, 4, and 6. PSTBCs have the 
properties of full rate, full diversity, uniform average transmitted energy per antenna, good shaping of the 
constellation, and nonvanishing constant minimum determinant for increasing spectral efficiency which 
offers high coding gain. In lfT4ll . PSTBCs were generalized to any dimension. However, it was proved 
in [fT5l that particular PSTBCs, yielding increased coding gain, only exist in dimensions 2, 3, 4, and 6. 
Due to the advantages of PSTBCs, the Golden Code (GC), which is the best known PSTBC for MIMO 
systems with two transmit and two receive antennas [[T6ll . [fTTl . has been incorporated into the 802. 16e 
Worldwide Interoperability for Microwave Access (WiMAX) standard [fTSl . 

Despite these advantages, the high decoding complexity of PSTBCs, especially for large dimensions, 
is a critical issue for practical employments. For the PSTBC of dimension D G {2,3,4,6}, since each 
codeword employs D"^ information symbols from an M -QAM or M-HEX |fT9l constellation, M^^ points 
are calculated by exhaustive search to achieve the Maximum Likelihood (ML) decoding. Therefore, the 
decoding complexity is proportional to M^^, denoted by 0{M^^). Sphere Decoding (SD) is an alternative 
for ML with reduced complexity [20] . While SD reduces the average decoding complexity, the worst-case 
complexity is still 0{M^^). Several techniques have been proposed to reduce the decoding complexity 
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of PSTBCs. In [|2T]| . an approach based on the conditional ML was applied to obtain essentially ML 
performance with the worst-case complexity of 0{M^''^^^^). la [|22]| . [|23l . [|24l . the worst-case complexity 
of PSTBCs was reduced to (9(M*^^~°^)^~°^) without performance degradation. In (25], a decoding 
technique applying the Diophantine approximation was presented for GC with the complexity of (9(M^) 
and the trade-off of 2dB performance loss. In [|26ll . ETl . [|28l . suboptimal decoders for PSTBCs were 
discussed. 

In this paper, two novel techniques are proposed. The first technique, Perfect Coded Multiple Beamform- 
ing (PCMB), combines PSTBCs with multiple beamforming and achieves full diversity, full multiplexing, 
and full rate simultaneously, in a similar fashion to a MIMO system employing PSTBC and FPMB. With 
the knowledge of CSI at the transmitter (CSIT), the threaded structure of the PSTBC could be separated at 
the receiver, and the decoding complexity of PCMB is thereby substantially reduced compared to a MIMO 
system employing PSTBC and similar to FPMB. This result offers a new prospect of CSIT since it is 
mostly used to enhance either the performance or the throughput of a MEMO system. Especially, because 
of the special property of the generation matrices in dimensions 2 and 4, the real and the imaginary parts 
of the received signal can be decoded separately, and therefore PCMB provides much lower decoding 
complexity than FPMB in these two dimensions. For instance, the worst-case decoding complexity of a 
MIMO system employing GC, FPMB of dimension 2, and Golden Coded Multiple Beamforming (GCMB), 
which is the PCMB of dimension 2, are 0(M^-^), 0{M), and 0(a/M) respectively. On the other hand, 
the second technique, Bit-Interleaved Coded Multiple Beamforming with Perfect Coding (BICMB-PC) 
transmits bit-interleaved codewords of PSTBC through the multiple subchannels. BICMB-PC achieves full 
diversity and full multiplexing simultaneously, in a similar fashion to BICMB-FP. Because the real and 
imaginary parts of the received signal can be separated, and only the part corresponding to the coded bit is 
required to calculate one bit metric for the Viterbi decoder in dimensions 2 and 4, which also results from 
the special property of the generation matrices, BICMB-PC achieves much lower decoding complexity than 
BICMB-FP, and the complexity reduction from BICMB-FP to BICMB-PC is greater than the reduction 
from FPMB to PCMB in these two dimensions. For instance, the worst-case complexity for acquiring one 
bit metric of BICMB-FP of dimension 2 and Bit-Interleaved Coded Multiple Beamforming with Golden 
Coding (BICMB-GC), which is the BICMB-PC of dimension 2, are C(M) and 0{y/M) respectively. Since 
the precoded part of BICMB-PP could be considered as a smaller dimensional BICMB-FP, BICMB-PC of 
dimensions 2 and 4 could be applied to replace the precoded part and reduce the complexity for BICMB- 



pp. Furthermore, these aforementioned complexity reductions achieved by PCMB and BICMB-PC only 
cause negligible or modest loss in performance. 

The remainder of this paper is organized as follows: In Section |Ill the descriptions of PCMB and 
BICMB-PC are given. In Section [nl] and |IVl the diversity analysis and decoding technique of PCMB and 
BICMB-PC in dimension 2 are first presented, and then generalized to larger dimensions, respectively. 
In Section |Vl performance comparisons of different techniques are carried out. Finally, a conclusion is 
provided in Section |VIl 



Notations: Bold lower (upper) case letters denote vectors (matrices). The notation diag[6i 



denotes a diagonal matrix with diagonal entries bi,. . . ,b£)- The notations 3?(-) and Q=(-) denote the real 
and imaginary parts of a complex number, respectively. The superscripts (•)^, (•)^, (■)*, and (■) stand for 
the conjugate transpose, transpose, complex conjugate, and binary complement, respectively. The notation 
[•] denotes the ceiling function that maps a real number to the next largest integer. The notations IR+ and 
C stand for the set of positive real numbers and the complex numbers, respectively. 



II. System Model 



A. PCMB 



Fig. |l(a)| represents the structure of PCMB. The information bit sequence is first mapped by Gray 
encoding and modulated by M -QAM or M -HEX. Then, D"^ consecutive complex-valued scalar symbols 
are encoded into one PSTBC codeword, where D E {2,3,4,6} is the system dimension. Hence, the 
PSTBC codeword Z is constructed as 

D 



Z = J]diag(Gx,)E^-\ 



(1) 



v=l 



where G is an D x unitary generation matrix, x^, is an D x 1 vector whose elements are the f th D 
input scalar symbols, and 
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The specific G matrix for different dimensions can be found in lfT3l . lfT6ll . 

The MIMO channel H G C^*^^^* is assumed to be quasi-static, Rayleigh, and flat fading, and known 
by both the transmitter and the receiver, where A^^ and A^^ denote the number of receive and transmit 
antennas respectively. The beamforming vectors are determined by the SVD of the MIMO channel, i.e., 
H = UAV^ where U and V are unitary matrices, and A is a diagonal matrix whose sth diagonal element. 
As G is a singular value of H in decreasing order. When S < mm{Nt, Nj.} streams are transmitted 
at the same time, the first S vectors of U and V are chosen to be used as beamforming matrices at the 
receiver and the transmitter, respectively. For a MIMO system employing PSTBC in dimension D, 
information symbols are transmitted through D time slots. In the case of PCMB, to achieve the same rate as 
a MIMO system employing PSTBC, the number of streams is S where Nt = Nr = S = D e {2,3, 4, 6}. 

The received signal is 



where Y is a. D x D complex- valued matrix, and N is the Dx D complex- valued additive white Gaussian 
noise matrix whose elements have zero mean and variance Nq = D/SNR. The channel matrix H is 
complex Gaussian with zero mean and unit variance. The total transmitted power is scaled as D in 
order to make the received Signal-to-Noise Ratio (SNR) SNR. Note that in the case of a MIMO system 
employing PSTBC, the received signal is simply Y = HZ + N. With the knowledge of CSIT, the channel 
matrix H is now replaced by the diagonal matrix A in ([21). 

Let X denote the signal set of the modulation scheme and define X(„ j,) as the (u, v)th symbol in 
X = [xi, . . . , xz)] where u,v E {1, ■ ■ ■ , D}. Define the one-to-one mapping from X to Z as Z = M{X}. 
Therefore, the ML decoding of Q is obtained by 



Y = U^HVZ + N = AZ + N 



(2) 



X 



arg 



mm 



Y- AM{X}f . 



(3) 
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B. BICMB-PC 

The structure of BICMB-PC is presented in Fig. |l(b)[ First, the convolutional encoder of code rate 
Rc, possibly combined with a perforation matrix for a high rate punctured code [|29l . generates the bit 
codeword c from the information bits. A random bit-interleaver is then applied to generate the interleaved 
bit sequence, which is then modulated by M-QAM or M-HEX and mapped by Gray encoding. Eventually, 
D"^ consecutive complex- valued scalar symbols are encoded into one PSTBC codeword as ([T]). 

Hence, the /cth PSTBC codeword is constructed as 

D 

Z, = ^diag(Gx,,fc)E''-\ (4) 

where x^, ^ is an D x 1 vector whose elements are the f th D input modulated scalar symbols to construct 
the A;th PSTBC codeword. 

The received signal corresponding to the A;th PSTBC codeword is 

Yfc = AZ,. + Nfc (5) 

where Y^, Z^, and are the received symbol matrix, the PSTBC codeword, and the noise matrix 
corresponding to the /cth PSTBC codeword, respectively. 

The location of the coded bit cy within the PSTBC codeword sequence is denoted as k' — )■ [k, (m, n),j), 
where k, (m, n), and j are the index of the PSTBC codewords, the symbol position in X/,. = [xi fc, . . . , xz),fc], 
and the bit position on the label of the scalar symbol X{ra,n),k^ respectively. As defined in Section ITl-A[ x 
denotes the signal set of the modulation scheme. Let xi denote a subset of x whose labels have 6 G {0, 1} 
in the jth bit position. By using the location information and the input-output relation in Q, the receiver 
calculates the ML bit metrics for c^./ = 6 as 



r(-'")'^(Yfc,CfcO= min ||Y,-AM{X}f, (6) 

where rjc^'"^'-' is defined as 

^(m,n),j ^ . X(^u,v)={m,n) € xl, and X(„,^,)^(m,n) G X} ■ 



Finally, the ML decoder, which uses the soft-input Viterbi decoding [.30J to find a codeword with the 



minimum sum weight, makes decisions according to the rule given by [|3T1l as 



c = argmin^r('^'")'^(Y, 



k, Ck' 



(7) 



III. PCMB 

In this section, the diversity and decoding complexity analyses of GCMB, which is PCMB of dimension 
2, are first investigated in Section ITlI- Al and Section ITlI-B I respectively. Then, they are generalized to larger 
dimensions in Section |III-C[ More discussion is provided in Section IIII-D[ 



A. Diversity Analysis 

For ML decoding, the instantaneous Pairwise Error Probability (PEP) between the transmitted codeword 
X and the detected codeword X is represented as 



Pr X ^ X I H = Pr IIY - AZlr > IIY- AZlr I H = Pr e > IIAfZ - Z 



H 



(8) 



where Z = M{X} and e = Tr{-(Z - Z)^A^N - N^A(Z - Z)}. Since e is a zero mean Gaussian 
random variable with variance 2Ai'o||A(Z — Z)|p, dS]) is given by the Q function as 



Pr X ^ X 



h) = g 



|AfZ-Z)IP 



2No 



(9) 



By using the upper bound on the Q function Q{x) < \e ^'^Z^, the average PEP can be upper bounded as 



Pr X ^ X = E 



Pr X ^ X I H 



< E 



lAfZ - Z) 



exp 



2 """" \ 4iVo 

Let with u E {1, 2} denote the uth row of G. Then, equation (HI) can be rewritten as 



Xi g^ X2 
ig^X2 g^Xi 



(10) 



(11) 



Therefore, 



AZ 



Aigfxi Aigfx2 
iA2gJx2 A2gjxi 



(12) 



Then, 



D D 



I AZf = Tr {Z^A^AZ} = Y.\lY. Ig^x. 



(13) 

u=l v=l 

where D = 2 for the purposes of (fT3l)-(fT6l) in this subsection. As will be discussed later, (fT3])-(fT6l) are 
actually valid for larger values of D as well. Let xi and X2 denote the detected symbol vectors. By 
replacing xi and X2 in (fT3l) by xi — xi and X2 — X2, (fTOl) is then rewritten as 



Pr X ^ X <E 



4Ni 







(14) 



where 



Pu 



D 

E 

v=l 



(15) 



The upper bound in (fT4l) can be further bounded by employing a theorem from ll32l which is given 
below. 

Theorem. Consider the largest S < min(A'"t, Nr) eigenvalues fig of the uncorrelated central Nr x Nt 
Wishart matrix that are sorted in decreasing order, and a weight vector p = [pi, ■ ■ ■ ,Ps]'^ with non- 
negative real elements. In the high SNR regime, an upper bound for the expression £'[exp(— 7 X]f=i PsPs)], 
which is used in the diversity analysis of a number of MIMO systems, is 

s 



E 



exp -7 PsPs 



s=l 



where 7 is SNR, ( is a constant, pmin = minp^^o {pi}i=i, and S is the index to the first non-zero element 
in the weight vector 

Proof: See [[321. □ 
Based on the aforementioned theorem, full diversity is achieved if and only if 5 = 1, which is equivalent 
to pi > 0. Note that pi > in (fT5l) because all elements in gf are nonzero |fT6l , and therefore 5 = 1. By 
applying the Theorem to (fT4l) . an upper bound of PEP is 



Pr(X^X) < C ( ^^^i^^^lli^SiVi?^ 



(16) 



Since Nt = N^. = D = 2 in this case, GCMB achieves the full diversity order of 4. 



B. Decoding 

Equation (fT2l) shows that each element of AZ is only related to xi or X2. Consequently, the elements 
of AZ can be divided into two groups, and the first and second groups contain elements related to xi 
and X2, respectively. The input-output relation in Q then is decomposed into two equations as 



Let ill = [?^(i,i),n(2,2; 



yi 



y2 



1^ and n2 



2/(1,1) 




Aigfxi 


+ 




2/(2,2) _ 




_ A2gf'xi _ 




n{2,2) 


2/(1,2) 




Aigfxs 


+ 


ri(l,2) 


2/(2,1) 




iA2g^X2 




ri(2,i) 



(17) 



[^(1,2), ^(2,i)]^5 then (flTl) can be further rewritten as 

yi = AGxi + fii, 
y2 = $AGx2 + n2, 



(18) 



where 



1 
i 



The input-output relation of (fTSl ) implies that the threaded structure of the codeword in ([T]) is now 
separated with the knowledge of CSIT, and therefore xi and X2 can be decoded independently. 

By using the QR decomposition of AG = QR, where R is an upper triangular matrix, and the matrix 
Q is unitary, (fTSl) is rewritten as 



(19) 



yi = Q^yi = Rxi + Q^iii = Rxi + fii, 

y2 = Q^^^y2 = RX2 + Q^$^n2 = RX2 + fi2. 

Indeed, each relation of (fT9l) has the same form as FPMB presented in [|9l, [fTTI . [fT2l . which is the 
state-of-the-art full-diversity full-multiplexing SVD-based uncoded technique. FPMB is the special case 



of Constellation Precoded Multiple Beamforming (CPMB) whose system model is presented in Fig. 1(c) 



when the number of precoded symbol streams equals to the number of employed subchannels. In Fig. 1 1(c) 
0p is the constellation precoding matrix to precode P symbol streams, and T is a permutation matrix 
to select precoded subchannels. In lf33l . [|34ll . a reduced complexity SD is introduced. The technique 
takes advantage of a special real lattice representation, which introduces orthogonality between the real 
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and imaginary parts of each symbol, thus enables employing rounding (or quantization) for the last two 
layers of the SD. When the dimension is 2 x 2, it achieves ML performance with the worst-case decoding 
complexity of 0{M). This technique can be employed to decode both GCMB and 2x2 FPMB, since 
their input-output relations can be written in the same form as (fT9l) . 

Furthermore, lower decoding complexity can be achieved for GCMB because of the special property 
of the G matrix. The G matrix for dimension 2 is given by lfT6l as 



V5 



I + i(3 a — i 
1 + ia (3 — i 



with a 



and (3 = Let fy denote the vth column of 



AG = ^ 

V5 



Xi{l +if3) Ai(a-z) 
A2(l + ia) X2{(3-i) 

where v E {1,2}. The nonzero elements of the diagonal matrix R are calculated as 



(20) 



'^(1,1) 

'^(1,2) 



llflll, 

llfill 5||fi|| 

fff2 



(21) 



|fi| 



rfi 



Note that R is a complex-valued matrix in general when the QR decomposition is applied to a complex- 
valued matrix. However, based on (|2TI) . the R matrix is real-valued for GCMB, which is due to the 
special property of the G matrix. Hence, the real and imaginary parts of (fT9l ) can be decoded separately. 
Consequently, (fT9l) can be decomposed further as 



(22) 



3^{yJ = R^^{xJ + 3^{fl4, 
'^{fu} = R5^{x„} + ^^{n^t}, 

with M G {1, 2}. To decode each part of (|22|) . a two-level real-valued SD can be employed plus applying 
the rounding procedure for the last layer. As a result, the worst-case decoding complexity of GCMB is 

Previously, the ML decoding of GC was shown to have the worst-case complexity of (9(M^-^) [|22ll . 
||23l , [|24l . However, the above analysis proves that this complexity can be reduced substantially to only 
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0{\/M) by applying GCMB when CSIT is known. Furthermore, the complexity of GCMB is lower than 
FPMB as well. The worst-case decoding complexity of 2 x 2 FPMB with the decoding technique presented 
in (331, OH is 0{M) as mentioned above. 

C. PCMB 

For PCMB of dimension D E {3,4,6}, it can be proved that they all achieve the full diversity order 
of D^, which is generalized from (fT3l)-(fT6l) because they are still valid for larger D. 

For the decoding of PCMB in dimension D G {3,4,6}, similarly to GCMB, the elements of AZ are 
related to only one of the x^, thus can be divided into D groups, where the vth group contains elements 
related to x„. The received signal is then divided into D parts, which can be represented as 

= $„AGx^ + n^, (23) 

where = diag(0„ i, ■ ■ ■ , 0^,z)) is a diagonal unitary matrix whose elements satisfy 

{1, 1 < k < D + 1 - V, 
g, D + 2~v <k < D. 

By using the QR decomposition of AG = QR, and moving $t,Q to the left hand, (1231) is rewritten as 

= Q^$f y. = Rx, + Q^$f n, = Rx, + n,. (24) 

For the dimension of 4, the R matrix in (|24|) is real-valued, which can be proved in a similar way to 
GCMB in Section IIII-A[ See the Appendix for the proof. Consequently, the real part and the imaginary 
part of x^, can be decoded separately as (|22l) with u G {1,...,4}. Real- valued SD with the last layer 
rounded can be employed, and the worst-case decoding complexity of PCMB is then 0{M^^^). Regarding a 
MIMO system employing PSTBC, the worst-case decoding complexity is 0{M^^-^) by using the technique 
presented in [24]. For FPMB, ML decoding can be achieved by using SD based on the real lattice 
representation in [|33l , [|34l , plus quantization of the last two layers, and the worst-case complexity is then 
C(M3). 

Unfortunately, for the D = 3 (6) dimension case, the R matrix is complex- valued. Therefore, the real and 
the imaginary parts of x^ cannot be decoded separately, unlike the case of D = 2, 4. Moreover, since the M- 
HEX modulation is employed, which cannot be separated as two independent one-dimensional modulations 
as in M-QAM, a complex-valued SD, instead of a real-valued SD, with an efficient implementation of 
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a slicer [|24l is applied. The worst-case decoding complexity of PCMB is then 0{M'^) (0{M^)). In the 
case of a MIMO system employing PSTBC, the worst-case decoding complexity is 0{M^) {0{M^^)) 
Il24l . For FPMB, the worst-case decoding complexity is 0{M'^) (0(M^)), which is similar to PCMB. 

D. Discussion 

Table IJ summarizes the worst-case complexity of a MIMO system employing PSTBC which is denoted 
by PC, FPMB, and PCMB for different dimensions to decode one received symbol vector. 

As shown in Table H the decoding complexity of PCMB is substantially lower than PC. Actually, the 
problem of high decoding complexity results from the threaded structure of PSTBCs in ©• With the 
knowledge of CSIT, PCMB successfully separates the threaded structure of PSTBCs at the receiver, and 
thereby reduces the dimension of the decoding problem from D"^ to D, as (fT9l ) and (|24l) . which mainly 
results in the complexity advantage of PCMB over PC. This result provides a new prospect of CSIT, 
which could also be applied to reduce the decoding complexity of a MIMO system, since it is mostly 
applied to either increase the throughput or to enhance performance previously. 

Nevertheless, there are always tradeoffs among throughput, reliability, and complexity for MIMO 
systems in general [|2l, (31. In fact, the nonvanishing constant minimum determinant of PSTBCs, which 
offers high coding gain, is also derived from the threaded structure. As a result, this property is no longer 
valid for PCMB, which sacrifices the coding gain. The coding gain loss is hard to quantify, but simulation 
results in Section IV-AI show that only negligible or modest loss is caused. Other than the nonvanishing 
constant minimum determinant, other good properties of PSTBCs, i.e., full rate, full diversity, uniform 
average transmitted energy per antenna, and good shaping, are still valid for PCMB. 

In fact, since PSTBCs belong to the class of Threaded Algebraic Space-Time (TAST) codes [|35l . and 
CSIT results in separating the threaded structure at the receiver, the same idea can be applied to reduce 
the decoding complexity of general TAST codes as well. However, PCMB of dimensions 2 and 4 have 
a further advantage in terms of decoding complexity due to the real-valued upper triangular R matrices 
in (fT9l) and (|24|) . which leads to the separate decoding of the real and the imaginary parts as (|22|) . In 
other words, the D-dimensional complex- valued decoding problem can be further decomposed into two 
/^-dimensional real- valued decoding problems as (l22l) . This advantage is related to the special property of 
the generation matrices of PSTBCs in these two dimensions, and thereby is not valid for general TAST 
codes. 
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Note that full diversity of PCMB results from the fact that all elements of gj are nonzero, the complexity 
advantages of PCMB over PC is due to the knowledge of CSIT and the threaded structure of PSTBCs, 
and the additional complexity reduction of PCMB in dimension 2 and 4 is caused by the real-valued 
R matrices. Since they are not related to some good properties of PSTBCs such as uniform average 
transmitted energy per antenna and good shaping, there may exist other space-time block codes which 
can achieve the same advantages of PCMB. 

CSIT is assumed to be known for both PCMB and FPMB, and both of them achieve full diversity 
and full multiplexing. In dimensions 3 and 6, the decoding complexity of PCMB is similar to FPMB. 
However, PCMB has significant decoding complexity advantage in dimensions 2 and 4, due to their 
real-valued R matrices, over FPMB, whose R matrices are complex- valued. Similarly, since FPMB is 
designed to achieve high array gains [|9||, while PCMB does not concentrate on this aspect, tradeoffs 
between complexity and array gain might exist. The array gain is hard to quantify, but simulation results 
in Section IV-AI show that only negligible or modest loss is caused by PCMB compared to FPMB. 



In this section, the diversity and decoding complexity analyses of BICMB-GC, which is BICMB-PC of 
dimension 2, are first carried out in Section ITV- Al and Section lTV-Bl respectively. Then, they are generalized 
to larger dimensions in Section IIV-CI More discussion is provided in IIV-DI 

A. Diversity Analysis 

Based on the bit metrics in the instantaneous PEP of BICMB-GC between the transmitted bit 
codeword c and the decoded bit codeword c is 



Pr (c ^ c I H) = Pr V min ||Yfc - AM{X}f > V min ||Yfc - AM{X}f | H , (25) 



where c^/ and c^/ are the coded bit of c and c, respectively. Let (Ih denote the Hamming distance between 
c and c. Since the bit metrics corresponding to the same coded bits between the pairwise errors are the 
same, (|25l) is rewritten as 



IV. BICMB-PC 





Pr (c ^ c I H) = Pr ^ 



mm 



Yfc-AM{X}f > J2 



min ||Yfc- AM{X}f | H 



(26) 



k',dH 
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where J2k' dn stands for the summation of the dn values corresponding to the different coded bits between 
the bit codewords. 
Define X.^ and as 



Xfc = arg min ||Yfc - AM{X] 



Xfc = arg min ||Yfc-AM{X}f. 



(27) 



It is easily found that X^ is different from X^. since the sets that X(m,n) belong to are disjoint, as can 
be seen from the definition of r]i^^. In the same manner, it is clear that X^ is different from X^. With 
Zfc = M{Xfc} and = M{Xfc}, ^ is rewritten as 



Pr (c — )■ c 



H) = Pr ( ^ \\Yk - AZfcf > llYfc - AZ/i 

\k',dH k',dH 



(28) 



Based on the fact that ||Yfc — AZ^p > ||Yfe — AZ^p, and the relation in ([5]), equation (|28l) is upper 
bounded by 



Prfc ^ c I H) < Pr I ^ \\Yk - AZ 

\k' ,dH 



k\\ ^ 



l|Y, - AZ,f j = Pr ( e > ^ ||A(Z, - Zfc)f j 

k',dH J \ k',dH / 



(29) 



where e = Tr[— (Z^ — Zk)^A^Nfc — 'N^ X{7jk — Z^)]. Since e is a zero-mean Gaussian random 

variable with variance 2Nq Ylik' du ll-A(Zfe — Zfc)||^, the average PEP can be upper bounded in a similar 
fashion to ^ and dTO]) as 



Pr (c ^ c) = [Pr (c ^ c I H)] < E 



According to (1131) . (1301) is rewritten as 



Pr (c ^ c) < 



where 



T.k',dH WM'^k - Za 

AN. 



(30) 



2exp 



D 



4:Nn 



(31) 



Pu,k = J^|g^(x^,fc -X„,, 



(32) 



v=l 



and D = 2 for the purposes of (|3T|) -(l33l) in this subsection. As will be discussed later, (|3T|) -(l33l) are 
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actually valid for larger values of D as well. 

Applying the theorem presented in Section IHl-AI to (|3T1) . 6 = 1 because pi ^ > in (|32l) . Therefore, an 
upper bound of PEP is 



\ -NrNt 



Pr (c ^ c) < C (^ ""^^"^^''"'''^"" ^Ari^j . (33) 
Since Nt = Nr = D = 2 in this case, BICMB-GC achieves the full diversity order of 4. 



B. Decoding 

Similarly to (fT2l) . each element of AZ^ for BICMB-GC in ^ is related to only xi ^ or X2^k- Conse- 
quently, the elements of AZ^ can be divided into two groups, and the first and second groups contain 
elements related to xi ^ and X2,fc, respectively. The input-output relation in ([5]) is then decomposed into 
two equations similarly to (fTSi) as 

yi,fc = AGxi^fc + rii^fc, 

(34) 

ya.fe = ^'AGxa^fc + 112,^, 

where yi,fc = [y{i,i),fc, y{2,2),fc]^, y2,k = [^(i,2),fc, ^(2,i),fc]^. ni^^. = [N(^i,i),k, N(^2,2),kV , and na,^ = [A^(i,2),fc, 
N{2,i),k]'^, with Y(^rn,n),k ^nd N(^rn,n),k denoting the (m, ?7,)th element of Yk and N^, respectively. 
By using the QR decomposition of AG = QR, (|34|) is rewritten as 

(35) 

y2,k = Q"^^y2,k = Rx2,fc + Q^$^n2,fc = Rx2,fc + fi2,fc. 

Then the ML bit metrics in ^ can be simplified as 

rK-)J(Y,, c,0 = min ||y^,fc - Rxf , (36) 
where f"'-^ is a subset of y^, defined as 

^"'^ = {x = [xi ■ ■ ■ xd^ ■ ^d=n e X6 and Xd^n e x}- 

Indeed, the simplified ML bit metrics (|36l) have the same form as BICMB-FP presented in [fTOl . [fTTl . 
[fT2l . which is the state-of-the-art full-diversity full-multiplexing SVD-based coded technique. BICMB-FP 
is the special case of Bit-Interleaved Coded Multiple Beamforming with Constellation Precoding (BICMB- 
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CP) whose system model is presented in Fig. |l(d)[ when the number of precoded symbol streams equals 
to the number of employed subchannels. To calculate one ML bit metric, constellation points are 
considered by exhaustive search, and the complexity is thereby (9(M^). If SD presented in [|33l . Il34l is 
employed, the worst-case complexity for acquiring one ML bit metric is 0{M) for both BICMB-GC and 
2x2 BICMB-FP. 

Moreover, similarly to the uncoded case, lower decoding complexity can be achieved for BICMB-GC 
because the R matrix in (|36l ) is real- valued as proved in Section IIII-BI As a result, the real and imaginary 
parts of ym,k in (|36l) can be separated, and only the part corresponding to the coded bit is required for 
calculating one bit metric of the Viterbi decoder. Assume that square M-QAM is used, whose real and 
imaginary parts are Gray coded separately as two v^M-PAM. Define ^[^cy] and as the signal 

sets of the real and the imaginary axes of ^"^j, respectively. Therefore, the ML bit metrics in (l36l) can be 
further simplified as 

rK")>^-(Y,, c,0 = min ^[y^,,] - R^x]f, (37) 

5?[x]eKKc",^] 

if the bit position of c^' is on the real part, or 

rK-).^(Yfe,CfeO = min \mym,k] - R^3[x]||^ (38) 

if the bit position of Ck' is on the imaginary part. For (|37l) and (l38l) . the worst-case complexity of acquiring 
one bit metric is only 0{\/M) by using a real-valued SD with the last layer rounded, which is much 
lower than C(M) of 2 x 2 BICMB-FP 

C. BICMB-PC 

For BICMB-PC of dimension D E {3,4,6}, it can be proved that they all achieve the full diversity 
order of D^, which is generalized from (|3TI)-(|331) because they are still valid for larger D. 

For the decoding of BICMB-PC in dimension D e {3,4,6}, similarly to BICMB-GC, the elements 
of AZfc are related to only one of the x^, fc, thereby can be divided into D groups, where the vth group 
contains elements related to x^, fc. Then the received signal is divided into D parts, which can be represented 
similarly to (|23l) as 



(39) 
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By using the QR decomposition of AG = QR, and moving $t,Q to the left hand, (1391 ) is rewritten as 

yv,k = Q^$f y^,fc = Rx„,fc + Q^$f n^,^ = Rx^,fc + n^,^. (40) 

Then the ML bit metrics in ^ can be simplified as (l36l) . 

In the case of D = A, the R matrix in (l36l) is real-valued. See the Appendix for the proof. As a result, 
the real and imaginary parts of ym,fc in (l36l) can be separated, and only the part corresponding to the 
coded bit is required for calculating one bit metric of the Viterbi decoder. Assume that square M -QAM 
is employed. Then, the ML bit metrics in (l36l) can be further simplified as (l37l) if the bit location of Ck' 
is on the real part, or (|38l) if the bit location of c^/ is on the imaginary part. Therefore, the worst-case 
complexity for calculating one bit metric is only C(M^^) by using a real-valued SD with the last layer 
rounded. On the other hand, BICMB-FP has the worst-case complexity of C(M^) by using a real- valued 
SD based on the real lattice representation in [|33l , [|34l , plus quantization of the last two layers. 

For the dimension of 3 (6), the R matrix is complex-valued. Therefore, the real and the imaginary parts 
of ym,k cannot be separated, unlike the case of D = 2, 4. Moreover, since the M-HEX modulation is used 
instead of M-QAM, a complex- valued SD with an efficient implementation of a slicer [|24l is needed. The 
worst-case complexity of BICMB-PC to derive one bit metric is then C(M^) (C(M^)). For BICMB-FP, 
the worst-case complexity is 0{]VP) (0(M^)), which is similar to BICMB-PC. 

D. Discussion 

The worst-case complexity of BICMB-PC and BICMB-FP in different dimensions to calculate one bit 
metric is also summarized in Table HI Note that they are actually the same as PCMB and FPMB. 

Similarly, CSIT is assumed to be known for both BICMB-PC and BICMB-FP, and both of them 
achieve full diversity and full multiplexing. In dimensions 3 and 6, the worst-case complexity of BICMB- 
PC is similar to BICMB-FP. However, the real-valued R matrices in (|35] ) and (|40l ) cause the complexity 
advantages of BICMB-PC in dimensions 2 and 4 over BICMB-FP, whose R matrices are complex- 
valued. In this case, the D-dimensional complex-valued metric calculation problem can be decomposed 
into only one D-dimensional real-valued problem, instead of two D-dimensional real-valued problems for 
the uncoded case, because only one of the real and imaginary parts which corresponds to the coded bit 
needs to be considered. Therefore, the real- valued R matrices benefit BICMB-PC more than PCMB. 

For the constellation precoding technique JH, [fT2l . unlike the uncoded case where only full precoding 
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of FPMB can achieve both full diversity and full multiplexing, partial precoding of BICMB-PP for the 
coded case could also achieve both of them JT2l|. Note that the precoded part of BICMB-PP could be 
considered as a smaller dimensional BICMB-FP. Therefore, BICMB-PC of dimensions 2 and 4 could be 
applied to replace the precoded part of BICMB-PP and reduce its decoding complexity. 

V. Results 

A. PCMB 

As presented Section Unl PCMB in dimensions 2 and 4 has the most advantage in terms of decoding 
complexity. Therefore, simulations are focused on these two dimensions. 

Considering 2x2 systems. Fig. [2] shows BER-SNR performance comparison of GCMB, FPMB, and a 
MIMO system using GC, which is denoted by GC, for different modulation schemes. The constellation 
precoder for FPMB is selected as the best one introduced in [9|. Simulation results show that GC, FPMB, 
and GCMB, with the worst-case decoding complexity of C(M^-^), 0{M), and C(a/M), respectively, 
achieve very close performance for all of 4-QAM, 16-QAM, and 64-QAM. The performance differences 
among these three are less than IdB, and become smaller when the modulation alphabet size increases. 
In fact, the performance loss mentioned in Section IIII-DI is negligible in the 2x2 case. 

In the case of 4 x 4 systems. Fig. [3] shows BER-SNR performance comparison of PCMB, FPMB, PC, 
for 4-QAM and 16-QAM. The constellation precoder for FPMB is also chosen as the best one in [|9]. 
Simulation results show that PCMB has approximately 3dB and IdB performance degradations compared 
to PC and FPMB, respectively, and the degradations decrease as the modulation alphabet size increases. 
However, the modest performance compromises of PCMB in the 4x4 case trade off with substantial 
reductions of the worst-case decoding complexity for PC and FPMB from (9(M^^^) and 0{M^) to only 
C(M^-5), respectively. 

Obviously, the execution of SD with lower dimension has less complexity, including the worst case and 
the average case. Therefore, the worst-case decoding complexity is used to roughly compare the complexity 
of PC, FPMB, and PCMB in this paper above. In order to measure the average decoding complexity 
and show the exact complexity comparisons, the average number of real multiplications, which are the 
most expensive operations in terms of machine cycles, for decoding one transmitted vector symbol are 
calculated at different SNR for PC, FPMB, and PCMB, respectively. In (361, lEH, a reduced complexity 
SD technique substantially decreasing the average number of real multiplications was introduced, which 
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is employed in this paper. Fig. |4] and Fig. [5] show the complexity comparisons of GCMB with GC and 
FPMB respectively, for 2 x 2 MIMO systems using 64-QAM. The complexity of GCMB is 99% and 48% 
lower than GC at low and high SNR respectively, while it is 70% lower than FPMB at low SNR and close 
to FPMB at high SNR. Fig. [6] shows the complexity comparisons for PCMB and PC for 4 x 4 MIMO 
systems using 4-QAM. The complexity of PCMB is 2.7 and 1.7 orders of magnitude lower than PC at 
low and high SNR respectively. Fig. |7] shows the complexity comparisons for PCMB and FPMB for 4x4 
MIMO systems using 16-QAM. The complexity of PCMB is 85% lower than FPMB at low SNR and 
close to FPMB at high SNR. Note that the improvements will be much greater for larger alphabet size. 

B. BICMB-PC 

As presented Section |IVl BICMB-PC in dimensions 2 and 4 has the most advantage in terms of 
complexity for calculating bit metrics. Therefore, simulations are focused on these two dimensions. 

Considering i?c = 2/3, 2 x 2 systems. Fig. [8] shows BER-SNR performance comparison of BICMB-PC 
and BICMB-FP. The constellation precoder for BICMB-FP is selected as the best one introduced in BU. 
Simulation results show that BICMB-FP and BICMB-PC, with the worst-case decoding complexity of 
0{M) and 0{\/M) to acquire one bit metric respectively, achieve almost the same performance for all 
of 4-QAM, 16-QAM, and 64-QAM. 

In the case of -Rc = 4/5, 4 x 4 systems. Fig. |9] shows BER-SNR performance comparison of BICMB-PC 
and BICMB-FP for 4-QAM and 16-QAM. The constellation precoder for BICMB-FP is also chosen as the 
best one in [91. Similarly, simulation results show that BICMB-PC achieves almost the same performance 
as BICMB-FP. Moreover, the worst-case complexity of C(M^^) to get one bit metric for BICMB-PC is 
much lower than that of 0{M^) for BICMB-FP. 

In order to measure the average decoding complexity, the average number of real multiplications for 
acquiring one bit metric is calculated at different SNR for BICMB-FP and BICMB-PC. In [381, an efficient 
reduced complexity decoding technique was introduced for BICMB-FP, which is applied in this paper. For 
fair comparisons, a similar decoding technique is employed to BICMB-PC. Fig. [10] shows the complexity 
comparisons for BICMB-PC and BICMB-FP for 2 x 2 MIMO systems using 64-QAM. The complexity 
of BICMB-PC is 86% and 70% lower than BICMB-FP at low and high SNR respectively. Fig. \n\ shows 
the complexity comparisons for BICMB-PC and BICMB-FP for 4 x 4 MIMO systems using 16-QAM. 
The complexity of BICMB-PC is 1.7 and 1.3 orders of magnitude lower than BICMB-FP at low and high 
SNR respectively. Note that the number of improvements will be much greater for larger alphabet size. 
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C. Discussion 

As presented above, both PCMB and BICMB-PC have advantage in decoding complexity. However, 
both of them require the knowledge of CSIT, which is usually partial and imperfect in practice due to 
the bandwidth limitation and the channel estimation errors, respectively. Recently, limited CSIT feedback 
techniques have been introduced to achieve a performance close to the perfect CSIT case for both uncoded 
and coded SVD-based beamforming systems [|39ll , [|40ll , [|4T]| . [|42ll . For these techniques, a codebook of 
precoding matrices is known both at the transmitter and receiver. The receiver selects the precoding 
matrix that satisfies a desired criterion, and only the index of the precoding matrix is sent back to the 
transmitter. In practice, similar techniques can be applied to PCMB and BICMB-PC. On the other hand, 
the performance of SVD-based MIMO systems was investigated with the channel estimation errors in [|43l . 
It was shown that the performance of SVD-based MIMO systems is sensitive to the channel estimation 
errors. Nevertheless, space-time coding is in fact a way to improve the performance of the beamforming 
technique with imperfect feedback. The reason is that the spatial diversity of space-time coding, which 
is independent of CSIT, becomes dominant when the quality of CSI is low, and most performance gains 
come from the spatial diversity O. Note that both PCMB and BICMB-PC belong to that category. 

In this paper, the antenna configuration oi Nt = Nr = S = D h considered for both PCMB and 
BICMB-PC to make the number of transmit and receive antennas equal to PSTBC. In fact, other antenna 
configurations are also valid as long as D = 5* < min{A^t, A^^}- More antennas result in greater singular 
values of subchannels used to transmit PSTBC, which leads to performance increase. Similarly, when 
the channel is frequency-selective instead of flat fading. Orthogonal Frequency Division Multiplexing 
(OFDM) can be applied to increase the diversity of coded SVD-based beamforming technique lEl, which 
results in performance enhancement as well. 

VI. Conclusion 

In this paper, two novel techniques, PCMB and BICMB-PC, are presented. PCMB and BICMB- 
PC combine PSTBCs with uncoded and coded multiple beamforming, respectively. As a result, PCMB 
achieves full diversity, full multiplexing, and full rate at the same time. The main advantage of PCMB 
compared to PC and FPMB is that it provides significantly lower decoding complexity than PC and 
FPMB, respectively, in dimensions 2 and 4. Although the complexity gains result in performance loss, 
it is negligible for dimension 2 and modest in dimension 4. Similarly, BICMB-PC achieves both full 
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diversity and full multiplexing, and its BER performance is almost the same as BICMB-FP. The advantage 
of BICMB-PC is that it offers much lower decoding complexity than BICMB-FP in dimensions 2 and 4. 
Therefore, BICMB-PC can be applied to replace the precoded part of BICMB-PP to reduce the decoding 
complexity. The performance investigation for limited feedback and frequency selective channels are 
considered as future work. 
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Appendix 

Proof of real-valued R matrix for PCMB in dimension 4 

The generation matrix G for PSTBC in dimension 4 can be found in [fT3l . Let fy denote the vth column 
of AG, and let /„ denote the uth element of with u,v e {1, . . . , 4}, then 

1 



^,i = -=A„[l+^(-3 + 0] 



V 15 



/„,3 = -^K[{-39y + 9l) + + 49., - 9l)], 
V 15 



(41) 



where 



f^^, = ^Xy[{-l-39u + 9l + 9l)+t] 
V 15 



^1 = 2cos(47r/15), ^2 = 2 cos(27r/15), 

9^ = 2 cos(167r/15), 9^ = 2 cos(87r/15). 



Note that 9f^ - 9l - A9l + 49^ + I = for u e {1, ... ,4} M 
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The nonzero elements of the diagonal matrix R are calculated as 
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(42) 



Based on (|42|) . the R matrix is real-valued for PCMB in dimension 4, which is due to the special property 
of the G matrix. 
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Fig. 1. Structure of (a) PCMB, (b) BICMB-PC, (c) CPMB (FPMB when P = S), and (d) BICMB-CP (BICMB-FP when P = S). 
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Fig. 2. BER vs. SNR of GC, FPMB, and GCMB for 2 x 2 systems. 
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Fig. 3. BER vs. SNR of PC, FPMB, and PCMB for 4 x 4 systems. 
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Fig. 4. Average number of real multiplications vs. SNR of GC and GCMB for 2x2 systems using 64-QAM. 
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Fig. 5. Average number of real multiplications vs. SNR of FPMB and GCMB for 2 x 2 systems using 64-QAM. 





Fig. 7. Average number of real multiplications vs. SNR for FPMB and PCMB for 4 x 4 systems using 16-QAM. 
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Fig. 8. BER vs. SNR of BICMB-FP and BICMB-GC for i?c = 2/3, 2 x 2 systems. 
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Fig. 9. BER vs. SNR of BICMB-FP and BICMB-PC for iic = 4/5, 4 x 4 systems. 
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Fig. 10. Average number of real multiplications vs. SNR of BICMB-FP and BICMB-GC for = 2/3, 2x2 systems using 64-QAM. 
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Fig. 11. Average number of real multiplications vs. SNR of BICMB-FP and BICMB-PC for iic = 4/5, 4 x 4 systems using 16-QAM. 



